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Background: Cancer incidence rates and trends are a mea¬ 
sure of the cancer burden in the general population. We 
studied the impact of reporting delay and reporting error on 
incidence rates and trends for cancers or the female breast, 
colorectal, lung/bronchus, prostate, and melanoma. Meth¬ 
ods: Based on statistical models, we obtained reporting- 
adjusted (ue., adjusted for both reporting delay and report¬ 
ing error) case counts for each diagnosis year beginning in 
1981 using reporting information for patients diagnosed 
with cancer in 1981-1998 from nine cancer registries that 
participate in the Surveillance, Epidemiology, and End 
Results (SEER) program. Joinpoint linear regression was 
used for trend analysis. All statistical tests are two-sided. 
Results: Initial incidence case counts (i.e., after the standard 
2-year delay) accounted for only 88%-97% of the estimated 
final counts; it would take 4-17 years for 99 % or more of the 
cancer cases to be reported. The percent change between 
reporting-adjusted and unadjusted cancer incidence rates 
for the 1998 diagnosis year ranged from 3% for colorectal 
cancers to 14% for melanoma in whites and for prostate 
cancer in black males. Reporting-adjusted current incidence 
trends for breast cancer and lung/bronchus cancer in white 
females showed statistically significant increases (estimated 
annual percent change [EAPC] — 0.6%, 95% confidence 
interval [Cl] = 0.1% to 1.2%) and 1.2%, 95% Cl = 0.7% 
to 1.6%, respectively), whereas trends for these cancers us¬ 
ing unadjusted incidence rates were not statistically signifi¬ 
cantly different from zero (EAPC = 0.4%, 95% Cl = -0.1 % 
to 0.9% and 0.5%, 95% Cl = -0.1% to 1.1%, respectively). 
Reporting-adjusted melanoma incidence rates for white 
males showed a statistically significant increase since 1981 
(EAPC = 4.1%, 95% Cl = 3.8% to 4.4%) in contrast to the 
unadjusted incidence rate, which was most consistent with a 
flat or downward trend (EAPC — -4.2%, 95% Cl =—11.1 % 
to 3.3%) after 1996. Conclusions: Reporting-adjusted cancer 
incidence rates are valuable in precisely determining current 
cancer incidence rates and trends and In monitoring the 
timeliness of data collection. Ignoring reporting delay and 
reporting error may produce downwardly biased cancer in¬ 
cidence trends, particularly in the most recent diagnosis 
years. [J Natl Cancer Inst 2002;94:1537-45] 


Cancer incidence rates measure the impact of cancer in the 
general population. The Surveillance, Epidemiology, and End 
Results (SEER) 1 program at the National Cancer Institute (NCI) 
has been actively collecting and reporting cancer incidence and 
survival data since 1973. The goal of a cancer registry that 
participates in the SEER program is to record every primary 
cancer in its catchment area (i.e., the geographic region for 
which the registry is responsible) in a timely and accurate man¬ 
ner, along with oilier information about each case, such as type 
of cancer, date of diagnosis, sex, race, and stage. 


However, reporting delay and reporting error hamper timely 
and accurate case reporting. Reporting delay time refers to the 
time elapsed before a diagnosed cancer case is reported to the 
NCi. NCI’s contract with registries in the SEER program speci¬ 
fies that the registries have up to 19 months to report cancer 
cases in a manner complete enough that the NCI can present the 
data publicly. For example, the first reports of cancer cases 
diagnosed in 1998 were reported to the NCI by August 2000, 
and data about those cases were released to the public in April 
2001. Hence, the standard delay time between cancer diagnosis 
year and the first report of cancer incidence data to the public is 
about 2 years. However, as either new cancer cases are discov¬ 
ered or erroneous cases are detected in the existing SEER data, 
collected information on cases from prior diagnosis years are 
updated in subsequent releases of the SEER data. For example, 
the SEER data released in April 2001 included updated cases 
diagnosed from 1973 through 1997 that were reported to the 
NCI by August 1999 and released in Aprit 2000. 

We refer to the problem of adding new cancer cases after the 
standard delay time as reporting delay and deleting of erroneous 
cases as reporting error. Modifying collected information on 
reported cases may cause both reporting delay and reporting 
error. For example, changing a prostate cancer patient's race 
from white to black will cause both reporting delay and report¬ 
ing error because a new cancer case for blacks is added and a 
previous case for whites has to be deleted. Reporting delay 
causes incidence rates to he underestimated and reporting error 
causes incidence rates to be overestimated; consequently, it is 
essential to adjust for both reporting delay and reporting error to 
obtain accurate cancer incidence rates and trends. 

Reporting-adjustment of cancer incidence rales is important 
because of the speciai interest in the incidence trend (especially 
any change in trend) in the most recent diagnosis years. The 
trend in cancer incidence from the most recent diagnosis years is 
generally biased, however, because the most recent diagnosis 
year will have the largest net undetTcporting of cases, with 
smaller amounts of underreporting for less recent years. For 
example, Horm and Kessler (J) reported that lung cancer inci¬ 
dence rates for white males declined 4.! % from 82.7 to 79.3 
cases per 100 000 person-years from 198Z to 1983 (rates age- 
adjusted to the 1970 U.S. standard), indicating a long-awaited 
downturn in lung cancer incidence consistent with an earlier 
decline in smoking rates. However, the current estimates for 
lung cancer incidence in white males arc 83.8 and 82.2 in 1982 
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and 1983, respectively—that is, a decline in incidence of only 
1.9% (2), 

The idea behind modeling reporting delay and reporting error 
is to adjust the current case count to account for anticipated 
future corrections (both additions and deletions) tD the data. In 
this study, we simultaneously model reporting delay and report¬ 
ing error in SEER incidence data from diagnosis years 1981 
through 1998 (using the approach of Midthune DN, Fay MP, 
Clegg LX, FeucrEi: unpublished data) and examine the impact 
of adjustment for both reporting delay and reporting error on 
cancer incidence rates and trends. Hereafter, we use the term 
reporting-adjustment to indicate adjustment for both reporting 
delay and reporting error and we report age-adjusted cancer 
incidence rates and trends with and without reporting-adjust¬ 
ment. To our knowledge, this study is the first attempt to for¬ 
mally adjust for biases in cancer incidence rates and trends 
resulting from both reporting delay and reporting error. Our 
analysis is focused on five cancer sites: female breast, colorectal, 
lung/bronchus, prostate, and melanoma. We included melanoma 
because it has a longer reporting delay than other cancer sites, 
presumably because of the difficulties associated with reporting 
a cancer that is increasingly diagnosed in a non-hospital setting. 
We chose the other four cancer sites because of their high inci¬ 
dence rates and thus their importance in cancer control. 

Methods 

Study Population and Data Source 

The SEER program currently collects cancer incidence and 
survival information from 10 population-based cancer registries 
that encompass nearly 14% of the total U.S. population. This 
study used data from nine cancer registries that are responsible 
for data collection in the states of Connecticut, Hawaii, Iowa, 
New Mexico, and Utah and the metropolitan areas of Atlanta, 
Detroit, Seattle-Puget Sound, and San Francisco-Oakland. Based 
on 1990 census data, these nine SEER registry areas cover more 
than 9% of the U.S. population. These nine registries have been 
participating in the SEER program since 1973, except for At¬ 
lanta, which has participated in the program since 1975. This 
study includes all invasive primary cancers of the female breast, 
colorectal, I ting,'bronchus, prostate, and melanoma that were di¬ 
agnosed in the nine SEER geographic areas between 1973 and 
1998. For the reporting delay models, we used only patients 
diagnosed from 1981 through 1998, the years when data on 
reporting information were readily available. Cancer sites and 
morphology were coded based on the International Classifica¬ 
tion of Diseases for Oncology, second edition (ICD-O-2). 

Data Structure for Delay Models 

For a particular cancer, the data used for modeling were 
two-dimensional triangular tables of initial incidence case 
counts reported at the 2-year standard delay time and the addi¬ 
tions and deletions of cases modified from the previous data 
submissions. Tabic 1 shows the summarized data for invasive 
melanoma. The data used in the delay models in this study 
correspond to a scries of similar tables, one for each subgroup 
(i = 1, .. ., I), where i indicates a subgroup with a total of 1 
subgroups, and a subgroup is defined by every combination of 
the three variables that are commonly used for standard report¬ 
ing (i.e., age at diagnosis, race, and sex) and the two variables 
for which reporting delay and error are likely to vary (i.c., reg- 
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istry areas and hospital versus non-hospital reporting source). A 
change in one of these variables for a particular case yields a 
reporting error in the subgroup from which the case is deleted 
and a reporting delay in the subgroup to which it is added. 

Except at the standard delay time of 2 years, where only the 
initial incidence case count is displayed, there are two case 
counts within each cell of Table 1: the additions (i.e., the number 
of melanoma cases added since the previous data submission) 
and the deletions (i.c., the number of previously reported erro¬ 
neous melanoma cases deleted). For example, 1817 melanomas 
diagnosed in 1981 were initially reported to the NCI in 1983. In 
the reporting year 1984, an additional 84 melanoma cases diag¬ 
nosed in 1981 were reported, and 51 melanoma cases were de¬ 
leted from the 1817 cases initially reported in 1983. In the re¬ 
potting year 2000, after 19 years of delay, 10 more melanoma 
cases diagnosed in 1981 were added and five melanoma cases 
were deleted that had been previously reported erroneously as 
diagnosed in 1981 some time between 1983 and 1999. As of 
2000, the net count for melanoma eases diagnosed in 198) was 
2048, rather than the initial 1817 (89% of 2048) reported in 
1983. Over the course of 19 years, 443 new melanoma cases 
were added and 207 erronoous melanoma cases were deleted. It 
should be noted that there was a large amount of reporting error 
in 1993 (i.e., many cases deleted from earlier years). This ir¬ 
regularity in the data may reflect the possibility that some mela¬ 
noma cases previously classified as "white" for those individuals 
of “unknown” race (because 99% of reported melanomas are 
from white individuals) were reclassified as “unknown” in 1993. 
Because we consider only melanomas for whites, we see dele¬ 
tions from “white” but not additions for "unknown.” We postu¬ 
late that the reclassification of cases based on race contributed to 
the large amount of reporting error in 1993, because when we 
combined the melanoma case counts for “white” and “unknown” 
so that wc did not have to consider changes from “white” to 
“unknown” to be reporting errors, we did not see this irregular 
result anymore. We show the detailed data for melanoma be¬ 
cause it has the worst case of reporting delays and reporting 
errors. 

Modeling Delay Distributions of Cancer Case Reporting 

Detailed formulation of staiistical models for reporting delay 
and reporting error are reported elsewhere (Midthune DN, Fay 
A1P, Clegg LX, Feuer EJ: unpublished data). Briefly, a delay 
distribution models the likelihood of a cancer being reported 
after a delay of d years (d = 2, 3 ... 19) after diagnosis. The 
number of career cases reported at each delay year (i.e., initial 
case count or additional cases) was assumed to follow a Poisson 
distribution. Cases were deleted as corrections to the data were 
made, and the probability of deleting cases at delay time d 
was modeled as a binomial distribution conditional on the net 
number of cases (E.e. additions minus deletions) through delay 
time d — 1. 

Delay distributions were modeled as a function of covariatcs 
using a discrete-time proportional hazards model and were 
stratified by those variables for which the assumption of pro¬ 
portional hazards was not valid. For the models used in this 
study, diagnosis year, delay times, race (black or white), and 
registry were included as potential covariates. Reporting source 
(i.e., hospital or non-hospital) was designated as a potential 
stratification variable rather than a covariate because of possibly 
large nonproportionai differences in the delay distribution. To 
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Tabic 1 . Surveillance, Epidemiology, and End Kesulta (SEER) Piugrain incidence data for cases of invasive melanoma reported to die NCI* 
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avoid model identifiability problems that come with modeling 
beyond the end of the observed data, we assumed that the num¬ 
ber of reporting delays and reporting errors after 19 years from 
diagnosis was either small enough to ignore or balanced each 
other out. 

Delay Model Selection 

For each cancer site, several different models were consid¬ 
ered. For reporting source, we either stratified by reporting 
source or not. The different methods of modeling the four co- 
variates related to how these covariates were modeled. Both 
registry and race were cither included in the model or not. Di¬ 
agnosis year was 1) left out of the model, 2) modeled as a 
continuous covariate, or 3) modeled as a categorized covariate: 
1981—1985, 1986-1990, or later than 1990. The delay limes 
were modeled by allowing different regression parameters for 
the first few delay times up to some delay time k and then having 
one regression parameter for all delay times greater than or equal 
to k. The three methods of modeling the delay time have k = 3, 
k = S,ork = 10. This approach has the advantage of smoothing 
the delay distribution where the data are sparse because there is 
less data with long delay times. The model with k = 3 smoothes 
more of the tail of the delay distribution (i.e., the portion of the 
distribution associated with long delay times) than the model 
with k = 10. Thus, for each cancer site except melanoma, 72 
models were fit to the data as a function of reporting source (two 
formulations: stratified by hospital or non-hospital or not strati¬ 
fied), registry (two formulations: included or not), race (two for¬ 
mulations; inciuded or not), diagnosis year (three formulations: 
included in model, modeled as a continuous covariate, or mod¬ 
eled as a categorized covariate), and delay time (three formula¬ 
tions: delay time = 1_delay time = k - 1, and delay time S-k 

for k = 3, k = 5, and k = 10)—that is, there were 72 possible 
model combinations (2x2x2x3x3 formulations). Race was 
not modeled for melanoma, because we used data only from 
whites. Hence, only 36 models were considered for melanoma. 

Each of the 72 (or 36 for melanoma) delay models were fit by 
the method of maximum likelihood (Midthune DN, Fay MF, 
Clegg LX, Fetter Ei: unpublished data). For model selection, we 
fit the models using incidence data from each of the annual 
August data submissions to the NCI between 1983 and 1998 
(i.e., corresponding to diagnosis years from 1981 through 1996), 
and then wc predicted the case counts for the 1998 diagnosis 
year and compared the predicted case counts with those reported 
in the 2000 data submission. For each cancer site, the model that 
minimized the sum of squared prediction emors was chosen as 
the default final model. However, to choose a more parsimoni¬ 
ous model when the default model contained 24 or more param¬ 
eters, we added an additional selection step in which possible 
competing models were selected using the following criteria: 
1) the competing model had less than half of the number of 
parameters of the default model, and 2) the percent change be¬ 
tween the prediction errors of the competing and the default 
models per extra parameter (i.e., percent change in prediction 
errors divided by the difference in the numbers of parameters 
between the two models) was less than 1%. If more than one 
competing model met the criteria, the model with the smallest 
percentage change per extra parameter was generally selected, 
although some ad hoc adjustments were also used. The chosen 
model was then refit using all data (reporting years from 1983 
through 2000 for cases diagnosed from 1981 through 1998) to 


Table 2 . Variables included in the fiRal delay models, by cancer site 


Cancer site 


Variables 


Female breast 

Colorectal 

Llingtbnmchus 

Melanoma 


Prostate 


Delay — 1 years,.... delay = 4 years, delay 5*5 years, 
registry areas 

Delay = 2 years, — , delay = 4 years, delay years 

Delay = 2 years,..., delay = 4 years, delay years, 

race 

Delay ~ 2 years, delay 2*3 years, diagnosed in 
1986-1990, diagnosed in 1991 or laler, stratified by 
reporting source (hospital or non-hospital) 

Delay = 2 years,.. ., delay - 9 years, delay ^10 years, 
registry areas, diagnosis year 


estimate the delay distribution. The 19-year reporting-adjusted 
net estimates of the cancer counts for each diagnosis year were 
then calculated based on the estimated delay distributions. The 
estimated standard errors for reporting-adjusted cancer counts 
accounted for variation in estimated delay probabilities. Case- 
reporting completeness for each diagnosis year was calculated as 
the ratio of the observed net count to the reporting-adjusted net 
counts. 

Calculation of Incidence Rates and Trends 

Age-adjusted (using the 1970 U.S. population as the stan¬ 
dard) cancer incidence rates were calculated using cancer case 
counts with and without reporting-adjustment. Joinpoint linear 



Fig. 1. Percent completeness of cancer incidence counts by diagnosis year for 
major cancer sites. Completeness was calculated as the ratio of case counts in 
each diagnosis year to the asymptotic count after 19 years, i.e., delay distribution 
was assumed ta be complete after 19 years. The 1981 diagnosis year has 100% 
completeness by model assumption. Major cancer sites are as follows: female 
breast (x), cofon/rectum (■), lung/bronchus { ♦ ). prostata (A), and melanoma 
{•>. Data are from the Surveillance, Epidemiology and End Results (SEER) 
program August 2000 submission. 
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regression (3,4) was used 10 fit connected linear trends on a log 
scale—that is, where there is a constant estimated annual per¬ 
centage change (EAPC) in each segment—to the 1973 through 
1998 age-adjusted incidence rates both with and without report¬ 
ing-adjustment. Because the delay distribution was assumed 
complete after 19 years, incidence rates for diagnosis years prior 
to 1981 were not reporting-adjusted. In joinpoint regression 
analyses, up to three change points (i.c., four trend line seg¬ 
ments) were allowed, and these were modeled to fall at either 
whole years or midway between diagnosis years, Change points 
were constrained to be at least 2 years away from both the 
beginning and the end of the data series, and they had to be at 
least 2 years apart. Models were fit using weighted least squares 
(weighted by appropriate variances of age-adjusted incidence 
rates) in version 2.6 of the joinpoint regression software devel¬ 
oped by NCI (available at http://srab.cancer,gov/joinpom(). 

Results 

Timeliness of Cancer Case Reporting 

The final delay distribution models varied by cancer site 
(Table 2), with only the melanoma delay model facing stratified 
by reporting source. Based on the SEER August 2000 data sub¬ 
mission, the 1998 incidence counts at the 2-year standard delay 
were about 3%-4% below the estimated numbers of cancer cases 
of female breast, colorectal, and lung/bronchus, according to our 
models (Fig. 1); more than 99% of estimated incidence counts 
were reported for lung/bronchus cancers diagnosed in 1996 (4- 
year delay), for colorectal cancers diagnosed in 1995 (5-year 
delay), and for female breast cancers diagnosed in 1991 (9-year 
delay). However, for 1998 diagnoses, only 88% of estimated 
melanomas and 89% of estimated prostate cancers were re¬ 
ported. Moreover, it would take up to II years for 99% of 


prostate cases diagnosed in 1989 to be reported and up to 17 
years for 99% of melanomas diagnosed in 1983 to be reported 
(Fig. 1). The “jumps” in the percent completeness for melanoma 
in 1986 and 1991 were caused by the inclusion of diagnosis year 
as a discrete covariate in the delay model (diagnosis years prior 
to 1986, 1986-90, and after 1990), because there were incre¬ 
mental improvements in timeliness of case repotting in each 
successive time period, especially in the reporting of non¬ 
hospital cases. 

Effect of Reporting-Adjustment on Cancer Incidence Rates 
and Trends 

Figs. 2—6 depict age-adjusted cancer incidence rates and 
trends by site, with and without reporting-adjustment. These 
figures show that reporting-adjustment tended to raise cancer 
incidence rates in more current diagnosis years (i.e., near the end 
of the data series) even when it did not cause changes in the 
location or number of change points in the incidence trends. The 
percent change between reporting-adjusted and unadjusted can¬ 
cer incidence rates in 1998 ranged from 3% for colorectal cancer 
(regardless of race or sex), to 4% for female breast cancer and 
lung cancer (regardless of race or sex), to 12% for prostate 
cancer in white males, and to 14% for melanoma in whites 
(regardless of sex) and prostate cancer in black males. 

For female breast cancer, the reporting-adjusted incidence 
rates for whites (Fig. 2) in the most recent years (i.e., the last 
segment of the trend line between 1987 and 1998) resulted in a 
statistically significant increasing incidence trend in the EAPC 
rate of 0.6% (95% Cl = 0.1% to 1.2%), whereas the trend for 
unadjusted incidence rates over the same time period was not 
statistically significantly different from zero (EAPC rate = 
0.4%, 95% Cl = -0,1% to 0.9%). The reporting-adjusted inci¬ 
dence rates also suggested an increasing trend (not statistically 


Fig. 2. Incidence and reporting-udjusted rales for female breast 
cancer by race. Rales are per 100 000 person-years and are 
age-adjusted to Ibe 1970 U.S. standard population. (•) — inci¬ 
dence data and (x) = reporting-adjusted data. Regression lines 
are calculated using the joinpoint regression program. EAPC •=« 
estimated annual percentage change in the regression line. 
Numbers in parentheses me the 95% confidence intervals of the 
EAPC. Data are from the Surveillance. Epidemiology and End 
Results (SEER) program August 2000 submission. 
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Fig. 3. incidence and reporting-adjusted rates for 
colorectal cancer by race and sex. Rates are per 
100 000 person-years and are age-adjusted to the 
1970 U.S. standard population. {*) = incidence 
data and (x) = reporting-adjusted data. Regres¬ 
sion lines are calculated using the joinpoint re¬ 
gression program. EAPC = estimated annua) 
percentage change in the regression line. Num¬ 
bers in parentheses are the 95% confidence in¬ 
tervals of the EAPC. Data are from the Surveil¬ 
lance, Epidemiology and Eod Results (SEER) 
program August 2000 submission. 


Fig. 4. Incidence and reporting-adjusted rales for 
prostate cancer by race. Rates are per 100000 
person-years and are age-adjusted to the 1970 
U.S. standard population. (•) ** incidence data 
and (x> = reporting-adjusted data. Regression 
lines are calculated using the joinpoint regres¬ 
sion program. EAPC = estimated annual per¬ 
centage change in the regression line. Numbers 
in parentheses arc the 95% confidence intervals 
qf the EAPC. Data are from the Surveillance, 
Epidemiology and End Results (SEER) program 
August 2000 submission. 
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White males 


Fig. 5, Incidence and reporting-adjusted rales for 
lang/bronchus cancer by race and sex. Rates are 
per 3 00 000 person-years and are afic-adjuslcd to 
the 1970 U.S, standard population. (•) = inci¬ 
dence data and (x) = reporting-adjusted data. 
Regression lines are calculated using the join- 
point regression program. EAPC = estimated 
annual percentage change in the regression line. 
Numbers in parentheses are the 95% confidence 
intervals of the EAPC. Data is from the Surveil¬ 
lance, Epidemiology and End Results (SEER) 
program August 2000 submission. 
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Fig. 6- Incidence and reporting-adjusted rates for 
melanoma in whites by sex. Rates are per 
100000 person-years and are age-adjusted to the 
1970 U.S. standard population. (•) = incidence 
data and <x) = reporting-adjusted data. Regres¬ 
sion lines are calculated using the jump-rant re¬ 
gression program. EAPC - estimated annual 
percentage change in the regression line. Num¬ 
bers in parentheses am the 95% confidence in¬ 
tervals of the EAPC. Data are from the Surveil¬ 
lance, Epidemiology and End Results (SEER) 
program August 2000 submission. 
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significant) for colorectal cancer in white females (Fig. 3) from 
1996 at an EAPC rate of 2.8% (95% Cl = -0.2% to 6.0%) 
versus an EAPC rate of 0.9% (95% Cl = -1.0% to 2.8%) for the 
unadjusted rates. Similarly, for prostate cancer in white males 
(Fig. 4) the reporting-adjusted incidence rates increased at an 
EAPC rate of 2.2% (95% Cl = -2.8% to 7.4%), whereas the 
unadjusted rates showed a decrease from 1995 on with an EAPC 
rate of-0.1% (95% Cl = -3.1% to 2.9%). 

For lung/bronchus cancer in white females, reporting- 
adjustment caused both a shift in the change point of the trend 
(from 1990.5 to 1988) and an increasing EAPC rate of 1.2% 
(95% Cl = 0.7% to 1,6%) from 1988 to 1996 rather than the Rat 
trend (EAPC = 0.5%, 95% CT = -0.1% to 1.1%) observed for 
the unadjusted incidence in the most current diagnosis years 
(from 1991 to 1998; Fig. 5). The reporting-adjusted melanoma 
incidence rates for white males decreased the number of change 
points in the incidence trend from 3 to 2 (Fig. 6); the reporting- 
adjusted incidence trend continued increasing after 1980 at an 
EAPC rate of 4.19b (95% Cl = 3.8% to 4.4%), whereas the 
unadjusted incidence rates indicated a flat or downward trend at 
an EAPC rate of -1.2% (95% Ci = -11.1% to 3.3%) after the 
1996 diagnosis year. 

Discussion 

In this article we have modeled reporting delay and reporting 
error to adjust the current cancer case counts for anticipated 
future corrections to the data. These reporting-adjusted case 
counts and the associated delay models are valuable in precisely 
determining current cancer incidence rates and trends and in 
monitoring the timeliness of data collection in cancer registries 
(Fig. 1). 

Although the SEER program allows about 2 years to collect 
newly diagnosed cancer cases, our results show that, depending 
on cancer site, it would take 4—17 years for 99% or more of the 
cancer cases to be reported, with the incidence case counts ini¬ 
tially reported at the 2-year delay time accounting for just 83%- 
97% of the estimated final incidence case counts. The low com¬ 
pleteness percentage for prostate cancer (compared with cancers 
other than melanoma) might reflect an increase in diagnosis 
of prostate cancer in outpatient settings. Because more cases 
were added to the case counts after the standard 2-year delay 
time than were removed, the net effect of reporting-adjustment 
for cancer incidence cose counts for the cancer sites examined 
was to increase cancer incidence rates in more current diagnosis 
years. The percent change between reporting-adjusted and un¬ 
adjusted cancer incidence rates for 1998 ranged from 3% for 
colorectal cancer (regardless of race or sex) to 14% for mela¬ 
noma (regardless of sex) and prostate cancer in black males. 
Tnus, our results suggest that ignoring reporting delay and re¬ 
porting error may result in the false impression of a recent de¬ 
cline in cancer incidence when the apparent decline is, in fact, 
caused by delayed reporting of the most recently diagnosed 
cases. 

Reporting-adjusted incidence rates help in more accurately 
estimating current trends, even if the change in the rates appears 
to be slight. For example, there is concern over a recent rise in 
the incidence of breast cancer, especially for white females (5). 
Although the reporting-adjusted incidence trends for breast can¬ 
cer in white females in crnr study did not reveal any new join- 
points in the incidence trend, the adjusted incidence trend since 
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1987 was one and half times as large as the unadjusted incidence 
trend (an EAPC rate of 0.6% versus 0.4%). This reporting- 
adjusted trend is statistically significantly different from aero, 
whereas the observed trend (i.e., unadjusted for reporting) 
was not. Research efforts to explain the cause for the recent rise 
in breast cancer incidence rates (e.g., increases in screening 
mammographic examination rates, change in risk factors) are 
warranted. 

For colorectal cancer there appear to be recent changes (albeit 
not statistically significant) in the incidence trends for whites 
occurring after 1995: the reporting-adjusted EAPC rate for die 
most recent incidence trend for white males is more than 10 
times that of the trend for the observed data (0.7% versus 
0.07%). whereas the reporting-adjusted incidence trend for fe¬ 
males is more than three times that of the trend for the observed 
data (an EAPC rate of 2.8% versus 0.9%). However, one must be 
cautious in the interpretation of these results because the 95% 
CIs around these recent trends are quite large (see Fig. 3). Analy¬ 
ses of Medicare claims data reveal that there is a recent upswing 
in the rates of polypectomies (i.e., endoscopic removal of small 
colorectal polyps) (Brown M: personal communication), and 
having the best possible estimate of recent cancer incidence 
trends is important to help quantify the relationship between 
changes in medical practice and its impact on trends. 

The reporting-adjusted trends for melanoma in white males 
revealed a continued increase in the incidence rates after 3980, 
whereas the trends in the observed data showed a downturn after 
1997. Delayed reporting of the most recently diagnosed mela¬ 
noma cases causes the false impression of a decline in the un¬ 
adjusted trend. 

Prostate cancer incidence trends have been under special 
scrutiny because the prostate-specific antigen (PSA) test- 
induced rise and fall in prostate cancer incidence rates from 1969 
to 1995 have important implications concerning the operating 
characteristics of PSA testing in the community setting. If there 
is no overdiagnosis of prostate cancer through PSA testing, then 
prostate cancer incidence should return to its underlying back¬ 
ground trend. Interestingly, our study shows that since 1995 the 
reporting-adjusted incidence trend for prostate cancer in white 
males has almost returned to its pre-PSA test-induced rise of 
approximately 3% per year (see Fig. 4). However, one must 
consider what the background incidence trend for prostate can¬ 
cer would have been in the absence of the introduction of PSA 
screening, especially because incidental cancers detected by 
transurethral resection of the prostate have declined in recent 
years because of the introduction of drugs for the medical man¬ 
agement of prostatie hypertrophy. 

SEER cancer incidence rates are often used to validate sta¬ 
tistical modeling on how risk factors and medical advances in¬ 
fluence population trends. For example, the NCI is sponsoring a 
cooperative group of scientists known as the Cancer Intervention 
and Surveillance Modeling Network (CISNET). CISNET inves¬ 
tigators arc modeling the Impact of cancer control interventions, 
such as treatment, screening, and prevention, on trends in breast, 
prostate, and colorectal cancer incidence and mortality and will 
eventually do so for lung cancer. These statistical models are to 
be validated using SEER cancer incidence trends. Therefore, 
accurate representations of current trends in cancer incidence are 
crucial to these modeling efforts. The NCI annua] report (SEER 
Cancer Statistics Review; available at http://secr.cancer.gov) 
now includes the reporting-adjusted cancer incidence rates and 
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trends (in graphs similar to Figs. 2—6) for the five cancer sites wc 
examined. 

Although we modeled reporting delay and reporting error in 
SEER data, cancer registries (and registries for other diseases) 
in the United Slates and throughout the world could also adapt 
this method. A SAS/IMU macro has been developed to perform 
these types of modeling calculations, which is available upon 
request from the authors. Such calculations need archived data 
sets from each year's data submission to make a comparison 
between sequential data releases to get the data in the same form 
as in Table 1. 

Reporting delay models have previously been used in the 
reporting of AIDS cases (6-9). However, these models have 
generally not explicitly modeled reporting errors and have only 
modeled reporting delays (Green TA: personal communication). 
This approach could lead to biased estimates of cancer incidence 
rates and trends, because reporting errors in more recent diag¬ 
nosis years arc less likely to have been corrected than those in 
earlier diagnosis years. Furthermore, not explicitly modeling re¬ 
porting error would underestimate the variation of the estimates 
for delay distributions that results from both reporting delay and 
reporting error. 

An important limitation of our reporting-adjustment mode! is 
that, as with ail other delay models, it does not account for cases 
that are never reported. Underreporting could be a serious prob¬ 
lem in monitoring cancer incidence trends if case finding at 
cancer registries is incomplete. Case finding is, therefore, an 
important aspect of data quality control for the SEER program. 
In addition, if reporting delays and reporting errors do not bal¬ 
ance each other out after 19 years, the likely consequence is that 
unadjusted incidence rates prior to 1981 are downwardly biased 
as the result of both reporting delays (i.e., downward bias) and 
reporting errors upward bias). 

In summary, this study provides the first known research on 
the impact of both reporting delay and reporting error on cancer 


incidence rates and trends. The reporting-adjusted case counts 
and the associated delay models are valuable in precisely deter¬ 
mining current cancer incidence rates and trends and in moni¬ 
toring the timeliness of data collection at cancer registries. 
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Notes 

* Editor's Note: SEER is a set of geographically defined, population-based, 
central cancer registries in the United States, operated by local nonprofit orga¬ 
nizations under contract to the National Cancer Institute (NCI). Registry data arc 
submitted electronically without personal identifiers to the NQ on a biannual 
basis, and the NCI makes the data available to the public for scientific research. 
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